Noise Suppression Method and Apparatus

ABSTRACT

The present invention relates to a method and apparatus of a digital filter for noise suppression of a signal representing an acoustic recording. The method comprises determining a desired frequency response (H(ω)) of the digital filter; and generating a noise suppression filter based on the desired frequency response. The desired frequency response is determined in a manner so that the desired frequency response does not exceed a maximum level, wherein the maximum level is determined in response to the signal to be filtered.

TECHNICAL FIELD

The present invention relates to the field of digital filter design. Inparticular, the invention relates to the field the design of digitalfilters for noise suppression in signals representing acousticrecordings.

BACKGROUND

Due to the ubiquitous presence of noise in natural environments,real-world sound recordings typically contain noise from varioussources. In order to improve the sound quality of sound recordings, arange of methods for reducing the noise level of sound recordings havebeen developed. Often, in such methods, a time-domain noise suppressionfilter is computed from a desired frequency response H(ω), and thetime-domain noise suppression filter is then applied to the soundrecording.

In an ideal noise suppression filter, the desired acoustic signal shouldpass through the filter undistorted, while noise should be completelyattenuated. These properties cannot be simultaneously fulfilled in areal filter (except in the special case when there is no desired signalor no noise, or when the desired signal and noise are spectrallyseparated). Hence, in determining a desired frequency response 1/(o) ofa filter, a trade-off between distorting the desired signal anddistorting the noise has to be made for frequencies at which both thedesired signal and noise are present.

The desired frequency response H(ω) can be estimated by means of variousmethods, such as spectral subtraction. In “Low-distortion spectralsubtraction for speech enhancement”, Peter Händel, ConferenceProceedings of Eurospeech, pp. 1549-1553, ISSN 1018-4074, 1995,different aspects of spectral subtraction methods for suppressing noiseare discussed. In U.S. Pat. No. 5,706,395, spectral subtraction isdiscussed and a method of defining the level to which noise should beattenuated is disclosed. In U.S. Pat. No. 5,706,395, the desiredfrequency response H(ω) is clamped so that the attenuation cannot gobelow a minimum value, wherein the minimum value may, according to U.S.Pat. No. 5,706,395, depend on the signal-to-noise ratio of the noisyspeech signal to be filtered. The clamping of the desired frequencyresponse of U.S. Pat. No. 5,706,395 prevents a noise suppression filterfrom fluctuating around very small values, thus avoiding a noisedistortion commonly referred to as musical noise.

In many spectral subtraction methods, the desired frequency response iscalculated as a function of the signal-to-noise ratio (SNR). Since theSNR of a noisy acoustic signal at a particular frequency varies withtime, the desired frequency response II (o) is generally updated overtime—often, the desired frequency response H(ω) is updated for eachframe of data. An effect of this is that a noise, which is at a constantlevel in the noisy speech signal, is often attenuated to a level thatvaries considerably with time in a noticeable manner, resulting influctuations of the residual noise. This undesirable effect is oftencommonly referred to as noise pumping, and can be heard as a shadowvoice.

SUMMARY

A problem to which the present invention relates is the problem of howto avoid undesirable fluctuations in the residual noise.

This problem is addressed by a method of designing a digital filter fornoise suppression of a signal to be filtered wherein the signalrepresents an acoustic recording. The method comprises: determining adesired frequency response of the digital filter and generating a noisesuppression filter based on the desired frequency response. The methodis characterised in that the determining of a desired frequency responseis performed in a manner so that the desired frequency response does notexceed a maximum level, wherein the maximum level is determined inresponse to the signal to be filtered.

The problem is further addressed by a digital filter design apparatusarranged to design a digital filter for noise suppression of a signal tobe filtered, wherein the signal represents an acoustic recording. Thedigital filter design apparatus comprises a desired frequency responsedetermination apparatus arranged to determine a desired frequencyresponse in response to the signal to be filtered, wherein the desiredfrequency response determination apparatus is arranged to determine amaximum level of the desired frequency response in dependence of thesignal to be filtered; and determine the desired frequency response in amanner so that the desired frequency response does not exceed themaximum level.

The problem is also addressed by a computer program product arranged toperform the inventive method.

By determining a maximum level of the desired frequency response of thedesigned filter in response to the signal to be filtered, undesirablefluctuations in the residual noise can be reduced, and hence, theperceived acoustic quality of the acoustic signal can be improved. Forexample, if the power density of the signal to be filtered varies withtime, the maximum level can be varied at a time scale that is adapted tothe time scale of the power density variations in a manner so that theeffects on the filtered signal of the power density variations areminimised.

Moreover, the maximum level can also be determined as a function offrequency. By allowing the maximum level to vary with the frequency ofthe signal to be filtered, the perceived quality of the filtered signalcan be improved even further. For example, at low frequencies whichtypically contain only noise, the maximum level can be set to a lowervalue than at high frequencies, where speech is often present.

The maximum level of the desired frequency response may advantageouslybe determined based on a measure of the noise level of the of the signalto be filtered, such as the signal-to-noise ratio or the noise power.

Further advantageous embodiments of the invention are set out by thedependent claims.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present invention, and theadvantages thereof, reference is now made to the following descriptionstaken in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic illustration of a digital filter design apparatus.

FIG. 2 a is a flowchart illustrating an embodiment of the inventivemethod.

FIG. 2 b is a flowchart illustrating an embodiment of the inventivemethod.

FIG. 3 is a schematic illustration of a desired response determinationapparatus according to an embodiment of the invention.

FIG. 4 a is a schematic illustration of a user equipment incorporating adigital filter design apparatus according to the invention.

FIG. 4 b is a schematic illustration of a node in a communicationssystem wherein the node comprises a digital filter design apparatusaccording to the invention.

FIG. 5 a illustrates results of simulations of signal filtering, whereina conventional filter design method has been used.

FIG. 5 b illustrates results of simulations of signal filtering, whereina filter design method according to the invention has been used.

DETAILED DESCRIPTION

A noisy speech signal y(t) having a desired speech component s(t) and anoise component n(t) may be denoted:

y(t)=s(t)+n(t).  (1)

In many situations, it is desirable to suppress the noise component n(t)and form an estimate ŝ(t) of the speech component in a manner so thatthe estimated speech component ŝ(t) as closely as possible resembles thespeech component s(t). One way to do this is by filtering the noisysignal y(t) with a time-domain noise suppression filter h(z) which isdesigned to remove as much of the noise component n(t) as possible,while retaining as much of the speech component s(t) as possible.

The noise suppression filter h(z) is usually computed from a desiredfrequency response H(ω), where H(ω) is a real-valued function that istypically designed so that H(ω) is close to zero for frequencies ω atwhich y(t) only contains noise, H(ω)=1 for frequencies CO at which y(t)only contains speech, and 0<H(ω)<1 for frequencies ω at which y(t)contains noisy speech.

When determining the speech component of a noisy signal, a lineartransform F[•] is normally applied to frames of samples of the noisysignal. By assuming the following relation:

F[ŝ(t)]=H(ω)F[y(t)]  (2)

where F[•] denotes a linear transform such as the Fast Fourier Transform(FFT), the noise suppression filter h(z) is obtained as the inverselinear transform F⁻¹[•] of the desired frequency response H(ω). Thus,the speech component estimate ŝ(t) is obtained by:

ŝ(t)=F ⁻¹ [H(ω)]

y(t)=h(z)

y(t)  (3)

where

denotes convolution.

Hence, in order to arrive at a speech component estimate ŝ(t), thedesired frequency response H(ω) has to be determined. As mentionedabove, 0<H(ω)<1 for frequencies ω at which y(t) contains noisy speech.The value of H(ω) at a particular frequency at which y(t) contains noisyspeech is often chosen in dependence of the Signal-to-Noise Ratio (SNR)of the noisy signal y(t) at that frequency.

The desired frequency response H(ω) can be estimated by means of variousmethods, such as spectral subtraction. Since the SNR at a particularfrequency varies with time, the desired frequency response H(ω) isgenerally updated over time—often, the desired frequency response H(ω)is updated for each frame of data. Hence, the desired frequency responseH(ω) typically varies between frames, so that H(k_(n),ω)≠H(k_(n+1),ω),where k_(n) denotes the timing of a frame having frame number n.Alternatively, the desired frequency response H(ω), and hence the filterarrangement determined from the desired frequency response, can beupdated at a different time interval. Thus, the desired frequencyresponse and the filter arrangement vary with time. However, in order tosimplify the description, this time dependency of H(ω) and h(z) will, inthe expressions below, generally not be explicitly shown.

When determining the desired frequency response H(ω) in a spectralsubtraction method, the following expression is often used:

$\begin{matrix}{{H(\omega)} = {( {1 - {{\delta (\omega)}( \frac{{\hat{\Phi}}_{n}(\omega)}{{\hat{\Phi}}_{y}(\omega)} )^{y_{1}}}} )^{y_{2}}.}} & (4)\end{matrix}$

where {circumflex over (Φ)}_(n)(ω) and {circumflex over (Φ)}_(y)(ω) areestimates of the power spectral densities of n(t) and y(t) respectively,and δ(ω) is an over-subtraction factor used to reduce musical noise. Asdiscussed above, it is often advantageous to limit the suppression ofnoise to a level H_(min) in order to limit small fluctuations of theresidual noise often denoted musical noise. Expression (4) then takesthe form:

$\begin{matrix}{{H(\omega)} = {\max {\{ {( {1 - {{\delta (\omega)}( \frac{{\hat{\Phi}}_{n}(\omega)}{{\hat{\Phi}}_{y}(\omega)} )^{y_{1}}}} )^{y_{2}},H_{\min}} \}.}}} & ( {4a} )\end{matrix}$

γ₁ and γ₂ are factors determining the sharpness of the transitionbetween H(ω)≈1 and H(ω)=H_(min). When γ₁=γ₂=1, expression (4) is oftendenoted the Wiener filtering approach.

FIG. 1 illustrates a filter design apparatus 100 arranged to generate anappropriate noise suppression filter h(z) based on a received samplednoisy speech signal y(t). Filter design apparatus 100 has an input 103for receiving the noisy speech signal y(t) to be filtered, and an output104 for outputting a signal representing the designed digital filterh(z). Filter design apparatus 100 comprises a linear transform apparatus105 arranged to receive the sampled noisy speech signal y(t) and togenerate the linear transform Y(o) of the sampled noisy speech signaly(t). Filter design apparatus 100 of FIG. 1 further comprises a desiredresponse determination apparatus 110 arranged to receive the lineartransform Y(ω) of the sampled signal y(t) and to determine the desiredfrequency response H(ω) based on the linear transform Y(ω). Filterdesign apparatus 100 further comprises a filter signal generationapparatus 112 comprising an inverse linear transform apparatus 115arranged to receive the desired frequency response H(ω) and to generatethe inverse linear transform of the desired frequency response H(ω).Generally, the output of the inverse linear transform apparatus 115 isfurther processed in filter signal generation apparatus 112, for examplein the manner described in U.S. Pat. No. 7,251,271, in order to obtainthe filter h(z). The output of the filter signal generation apparatus112 is a signal representing the filter h(z), and the output of filtersignal generation apparatus 112 is advantageously connected to output104 of filter design apparatus 100.

In an ideal noise suppression technique, any speech should passundistorted. Hence, H(ω) should fulfil H(ω)=1 for all frequencies atwhich the noisy speech signal y(t) comprises a speech component s(t). Onthe other hand, an ideal noise suppression technique should attenuateany noise to a desired noise level H_(min), requiring that H(ω)=H_(min)for all frequencies at which the noisy speech signal y(t) comprises anoise component n(t).

The desired properties above can generally not be fulfilled at the sametime, since speech and noise are often simultaneously present at thesame frequencies. Hence, in determining a desired frequency responseH(ω) of a filter, a trade-off between distorting the speech anddistorting the residual noise has to be made for frequencies at whichboth speech and noise are present. When H(ω)≠H_(min) at frequencies atwhich speech is present, the speech is said to be distorted. WhenH(ω)≠H_(min) at frequencies at which noise is present, the residualnoise is said to be distorted, where the residual noise is defined as

n ^(residual)(t)=h(z)

n(t).  (5)

According to the invention, the desired frequency response is selectedin a manner so that an appropriate maximum level of H(ω) is applied,wherein the maximum level is selected in response to the noisy speechsignal y(t). As will be seen below, the maximum level may be chosen suchthat the distortions in the speech and residual noise may be limited ina controlled manner. Fluctuations of the noise attenuation, as well asother effects of noise and speech distortion, may thereby be reduced.

In FIG. 2 a, a flowchart illustrating an inventive method of determiningthe desired frequency response H(ω) is shown. In step 205, a maximumlevel H_(max) of the desired frequency response is determined independence of the noisy speech signal y(t)—more specifically, themaximum level H_(max) can advantageously be determined in dependence ofthe linear transform Y(ω) of the noisy speech signal y(t). H_(max) couldbe determined based on the present time instance of the noisy speechsignal y(t), i.e. the time instance of the noisy speech signal to whichthe instance to be determined of the filter h(z) is to be applied; ontime instance(s) of the noisy speech signal y(t) that precedes the timeinstance to which the instance to be determined of the filter h(z) is tobe applied, or to a combination of present and previous time instancesof the noisy speech signal y(t). H_(max) may or may not be a function offrequency ω. In order to reflect this possibility, the maximum level ofH(ω) will in the following be denoted H_(max)(ω). Furthermore,H_(max)(ω) may or may not vary between different points in time.However, this variation will in the following generally not beexplicitly shown. H_(max)(ω) can be determined in a number of differentways, of which some are described below.

When H_(max)(ω) has been determined in step 205, step 210 is entered,wherein the desired frequency response H(ω) is determined in accordancewith H_(max)(ω). In one implementation of the invention, H(ω) could forexample be chosen to be equal to H_(max)(ω) for all frequencies ω abovea change-over frequency ω₀, and be equal to a minimum level H_(min) ofthe desired frequency response for frequencies lower than ω₀. In thisimplementation, the change-over frequency ω₀ could for example bedetermined as the frequency below which the power of the speechcomponent s(t) of the noisy speech signal is smaller than a thresholdvalue, or in any other suitable manner.

FIG. 2 b illustrates an implementation of the inventive method whereinthe step 205 of determining the desired frequency response is performedin dependence of an approximation H^(approx)(ω) of the desired frequencyresponse, as well as in dependence of the maximum level H_(max)(ω). Instep 205 of FIG. 2 b, the maximum level H_(max)(ω) is determined (cfFIG. 2 a). Step 207 is then entered, in which an approximationH^(approx)(ω) of the desired frequency response is determined based onthe linear transform Y(ω) of the sampled signal y(t). This approximationH(ω) of the desired frequency response can for example be obtained byuse of expression (4). Step 210 is then entered, in which a value ofH(ω) is determined based on a comparison between the approximationH^(approx)(ω) of the desired frequency response and the maximum valueH_(max)(ω) of the desired frequency response. Such determination couldfor example be performed by use of the following expression:

H(ω)=min {H ^(approx)(ω),H _(max)(ω)}  (6).

The selection expressed by expression (6) should preferably be made foreach frequency bin for which a value of H(ω) should be determined.Hence, step 210 of FIG. 2 b should preferably be repeated for eachfrequency bin for which a value of H(ω) should be determined. However,there may be situations where the limitation of the maximum level of thedesired frequency response is less advantageous for some parts of thefrequency spectrum. In implementations relating to such implementations,step 210 should only be repeated for the frequency bins for which alimitation of the maximum value of the desired frequency response isdesired.

Step 207 could alternatively be performed prior to step 205.

A check as to whether the value H^(approx)(ω) is smaller than a minimumvalue of the desired frequency response, H_(min), could be included inthe method of FIG. 2 b (as well as in the method of FIG. 2 a).

Expression (6) could then advantageously be altered as follows:

H(ω)=max {min {H ^(approx)(ω),H _(max)(ω)},H _(min)}  (6a)

or as follows:

H(ω)=min {max {H ^(approx)(ω),H _(min) },H _(max)(ω)}  (6b)

Whether to use expression (6a) or (6b) depends on whether it is desiredthat H(ω) takes the value H_(max)(ω), or the value H_(min), whenH_(min)>H_(max). Just like H_(max)(ω), H_(min) could vary withfrequency, and could take different values at different point in time.

As mentioned above, H_(max)(ω) could be set to a fixed value, whichapplies to all frequencies and/or all points in time. When H_(max)(ω) isindependent of time and frequency, a value of H_(max)<1 would serve tolimit the difference in noise suppression at a particular frequencybetween points in time where speech is present and points in time wherenoise only is present, i.e. the fluctuations of the residual noise maybe reduced. Distortion of speech would then always occur at least to theextent determined by H_(max). However, in order to reduce the distortionof speech, as well as improve the possibility of obtaining efficientreduction of the fluctuations of the noise attenuation, it isadvantageous to introduce a maximum desired frequency responseH_(max)(ω) that varies with both frequency and time.

The value of H_(max)(ω) determined in step 205 of FIG. 2 can for examplebe derived based on a measure of the noise level of the noisy speechsignal y(t), such as the signal-to-noise-ratio SNR(ω) of the noisyspeech signal y(t), the SNR(ω) of the speech component estimate ŝ(t) atdifferent frequencies, or the overall signal to noise ratio S{circumflexover (N)}R(t) of the speech component estimate ŝ(t) etc., where“overall” refers to that an integration is performed over the relevantfrequency band (cf. expression (14) below). Other measures couldalternatively be used for determining H_(max)(ω). Such other measuresshould preferably be related to a signal-to-noise ratio: For example,the determination of H_(max)(ω) can be based on the noise power levelP_(a)(t,ω) of the noisy speech signal y(t) at different frequencies, oron the overall noise level {circumflex over (P)}_(n)(t) of the noisyspeech signal. Measures of the noise power level of the signal y(t) canbe seen as measures of a signal-to-noise ratio, where the signal poweris assumed to be of a certain value. The value of H_(max)(ω) couldalternatively be based on the power level of the noisy speech signaly(t), or on any other measure of the noisy speech signal y(t).

H_(max) Based on a Worst Case Consideration of SNR(t,ω)

Since the SNR of the estimated speech component ŝ(t) obtained for aparticular time period depends on H(ω) when H(ω) varies over that timeperiod (see below), an expression for H_(max)(ω) can for example bederived from a worst case consideration of the SNR(ω) of the speechcomponent estimate ŝ(t).

The SNR(ω) of the speech component estimate ŝ(t) can be expressed as:

$\begin{matrix}{{{SNR}(\omega)} = {\frac{{\hat{\Phi}}_{\overset{̑}{s}}(\omega)}{{\hat{\Phi}}_{n^{residual}}(\omega)} \approx \frac{{H(\omega)}\{ {{{\hat{\Phi}}_{y}(\omega)} - {{\hat{\Phi}}_{n}(\omega)}} \}}{{H(\omega)}{{\hat{\Phi}}_{n}(\omega)}}}} & (8)\end{matrix}$

where {circumflex over (Φ)}_(ŝ), {circumflex over (Φ)}_(y), {circumflexover (Φ)}_(n) are estimates of the spectral densities of the estimatedspeech component ŝ(t), the noisy speech signal y(t) and the noisecomponent n(t), respectively, and {circumflex over (Φ)}_(nresidual)(ω)is an estimate of the spectral density of the residual noise,n^(residual)(t).

Instantaneously, the SNR(ω) of g(t) for a certain frequency ω isindependent of H(ω) (and equal to the SNR of y(t) at that frequency)(assuming that H(ω)>0 for all ω), as can be seen from expressions(1)-(3) and (8) above. However, in contrast to the instantaneous SNR,the SNR for a certain time period is typically dependent on H(ω) whenH(ω) varies over that time period. To illustrate this, the followingsimple example is considered, wherein the SNR is determined based on twosamples y(t_(A)) and y(t_(B)), collected at two different time instantst_(A) and t_(B), and wherein the sample obtained at t_(A) contains noisyspeech: y(t_(A))=s(t_(A))+n(t_(A)) and the sample at t_(B) contains onlynoise: y(t_(B))=n(t_(B)). Assuming that the desired frequency responseH(ω) for a certain frequency ω takes different values at the differentmoments in time, such that H(t_(A),ω)≠H(t_(B),ω), the SNR of ŝ(t) forthe frequency ω based on these two samples could be expressed as:

$\begin{matrix}{{{SNR}(\omega)} = {\frac{{{\hat{\Phi}}_{\overset{̑}{s}}( {t_{A},\omega} )} + 0}{\begin{matrix}{{{\hat{\Phi}}_{n^{residual}}( {t_{A},\omega} )} +} \\{{\hat{\Phi}}_{n^{residual}}( {t_{A},\omega} )}\end{matrix}} \approx {\frac{{H( {t_{A},\omega} )}\begin{Bmatrix}{{{\hat{\Phi}}_{y}( {t_{A},\omega} )} -} \\{{\hat{\Phi}}_{n}( {t_{A},\omega} )}\end{Bmatrix}}{\begin{matrix}{{{H( {t_{A},\omega} )}{{\hat{\Phi}}_{n}( {t_{A},w} )}} +} \\{{H( {t_{B},\omega} )}{{\hat{\Phi}}_{n}( {t_{B},\omega} )}}\end{matrix}}.}}} & ( {8a} )\end{matrix}$

The SNR in expression (8a) is clearly dependent on H(ω), sinceH(t_(B),ω) is only present in the denominator of expression (8a).

A worst case SNR will be given when assumed that speech is maximallyattenuated and noise is minimally attenuated. For a frequency ω, thiscan be denoted as

$\begin{matrix}{{{SNR}_{{worst}\mspace{14mu} {case}}(\omega)} \approx {\frac{H_{\min}^{2}( {{{\hat{\Phi}}_{y}(\omega)} - {{\hat{\Phi}}_{n}(\omega)}} )}{{H_{\max}^{2}(\omega)}{{\hat{\Phi}}_{n}(\omega)}}.}} & (9)\end{matrix}$

In order to limit the worst case SNR, a minimum value β of the worstcase SNR may be provided, where β may be a function of frequency:

$\begin{matrix}{{{SNR}_{{worst}\mspace{14mu} {case}}(\omega)} = {\frac{H_{\min}^{2}( {{{\hat{\Phi}}_{y}(\omega)} - {{\hat{\Phi}}_{n}(\omega)}} )}{{H_{\max}^{2}(\omega)}{{\hat{\Phi}}_{n}(\omega)}} \geq {{\beta (\omega)}.}}} & (10)\end{matrix}$

In expression (10), β(ω) forms a lower limit for the worst case SNR. βwill in the following be referred to as the tolerance threshold. Thetolerance threshold β should preferably be given a value greater thanzero for all frequencies.

Expression (10) yields the following expression for the maximum level ofH(ω):

$\begin{matrix}{{H_{\max}(\omega)} \leq \sqrt{\frac{H_{\min}^{2}}{\beta (\omega)}\frac{{{\hat{\Phi}}_{y}(\omega)} - {{\hat{\Phi}}_{n}(\omega)}}{{\hat{\Phi}}_{n}(\omega)}}} & (11)\end{matrix}$

By defining H_(max)(ω)=0 for the special case where H_(min)=0 or{circumflex over (Φ)}_(y)(ω)={circumflex over (Φ)}_(n)(ω), these caseswill also be covered by (11).

Since it is desirable that H(ω), and thereby also H_(max)(ω), is aslarge as possible in order to minimize the speech distortion, (11) canbe reduced to

$\begin{matrix}{{H_{\max}(\omega)} = \sqrt{\frac{H_{\min}^{2}}{\beta (\omega)}\frac{{{\hat{\Phi}}_{y}(\omega)} - {{\hat{\Phi}}_{n}(\omega)}}{{\hat{\Phi}}_{n}(\omega)}}} & (12)\end{matrix}$

The tolerance threshold β(ω) defines a limit for how small the worstcase SNR may be. β(ω) may take any value greater than zero. In noisesuppression applications for mobile communication, the value of β(ω)could for example lie within the range −10 to 10 dB. A typical value ofβ(ω) in such applications could be −3 dB, which has proven to reduce thefluctuations of the residual noise to a level where the residual noiseis unnoticeable for most values of H_(min)(ω), at a reasonable speechdistortion cost.

The tolerance threshold could for example be selected according to

β(ω)=f(D _(acceptable) ^(noise))  (13a)

or

β(ω)=g(D _(acceptable) ^(speech))  (13b)

where f is an increasing function, g is a decreasing function,D_(acceptable) ^(noise) is the acceptable distortion of the noise, andD_(acceptable) ^(speech) is the acceptable distortion of the speech(relations from which a value of D^(noise) and D^(speech) may beobtained are given in expressions (21) and (22) below).

β(ω) may also take a constant value over parts of, or the entire,frequency range. If minimisation of the residual noise distortion isgiven higher priority than the minimization of the speech distortion, βshould preferably be given a high value, such as for example in theorder of +3 dB. If, on the other hand, a minimization of speechdistortion is more important than a minimization of the residual noise,then β should preferably be given a lower value, for example in theorder of −7 dB.

In one implementation of the invention, the value of β(ω) could dependon whether or not the noisy speech signal contains a speech component ata particular time and frequency. If there is no speech component at theparticular frequency, the value of β(ω) could be set to a comparativelyhigh value, and when a speech component appears at this particularfrequency, the value of β(ω) could advantageously be slowly decreased toa considerably smaller value. In decreasing the value of β(ω) slowlyupon the presence of speech, it is achieved that an efficient noisesuppression is obtained at times when no speech is present, and that theresulting distortion of speech at the particular frequency is graduallyreduced in a manner so that a human ear listening to the signal does notnotice the gradual change in the filtering of the speech componentestimate.

H_(max) Based on the Overall Signal to Noise Ratio SNR

As mentioned above, H_(max)(ω) may be determined based on aconsideration of the overall signal to noise ratio S NR, where

$\begin{matrix}{{S\overset{\_}{N}R} = {\frac{\int_{w\; 1}^{w\; 2}{\{ {{{\hat{\Phi}}_{y}(\omega)} - {{\hat{\Phi}}_{n}(\omega)}} \} \ {\omega}}}{\int_{w\; 1}^{w\; 2}{{{\hat{\Phi}}_{n}(\omega)}\ {\omega}}}.}} & (14)\end{matrix}$

A value of H_(max) may for example be obtained from the followingexpression:

H _(max) =a[S NR]^(b) +c  (15),

or from the following expression:

H _(max) =a log₂ [S NR]+b  (16)

H_(max) Based on the Noise Power Level P_(n)(ω)

Furthermore, a value of H_(max)(ω) may alternatively be determined basedon a consideration of the noise power level P_(n)(ω), for example by oneof the relations provided in expression (17) or (18):

H _(max)(ω)=a[P _(n)(ω)]^(−b) +c  (17)

H _(max)(ω)=a log₂ [P _(n) (ω)]+b  (18)

H_(max) Based on the Overall Noise Power Level P _(n)

H_(max)(ω) may alternatively be determined based on a consideration ofthe overall noise power level P _(n), where P _(n) is the noise powerlevel measured over a frequency region between ω₁ and ω₂.

A value of H_(max) may for example be obtained from the followingexpression:

H _(max) =a[ P _(n)]^(−b) +c  (19),

or from the following expression:

H _(max) =a log₂ P _(n) +b  (20)

In expressions (15)-(20) above, a, b and c are representing constantsfor which appropriate values may be derived experimentally. Othermethods of determining the maximum level H_(max) of the desiredfrequency response could also be used.

An embodiment of the desired response determination apparatus 110according to the invention is illustrated in FIG. 3. The desiredresponse determination apparatus 110 of FIG. 3 comprises a responseapproximation determination apparatus 300, a maximum responsedetermination apparatus 305 and minimum selector 310. The responseapproximation determination apparatus 300 is arranged to operate on asignal fed to the input 315 of the desired response determinationapparatus 110, i.e. typically on the linear transform Y(ω) of the noisyspeech signal. Furthermore, the response approximation determinationapparatus 300 is arranged to determine an approximation H^(approx)(ω) ofthe desired frequency response based on the input signal. H^(approx)(ω)can advantageously be determined in a conventional manner fordetermining the desired frequency response, for example according toexpression (4) above.

The maximum response determination apparatus 305 of FIG. 3 is arrangedto determine a maximum level of the desired frequency response,H_(max)(ω). In many embodiments of the invention, the maximum responsedetermination apparatus 305 will be arranged to receive and operate uponthe linear transform Y(ω), or receive and operate upon the noisy speechsignal y(t), in order to determine H_(max)(ω), for example according toany of expressions (12) or (15) -(20) above. (In the embodiment of FIG.3, maximum response determination apparatus 305 is arranged to receivethe linear transform Y(ω). However, in other embodiments, H_(max)(ω)will be determined in other ways—one of them being that H_(max)(ω) takesa constant value—and the connection between the input to the desiredresponse determination apparatus 110 and the maximum responsedetermination apparatus shown in FIG. 3 may be omitted.

In the apparatus shown in FIG. 3, the output of the responseapproximation determination apparatus 300, from which a signalrepresenting H^(approx)(ω) will be delivered, and the output of themaximum response determination apparatus, from which a signalrepresenting H_(max)(ω) will be delivered, are both connected to aninput of minimum selector 310. The minimum selector 310 is arranged tocompare the signal representing H_(max)(ω) and the signal H^(approx)(ω),and to select the lower of H_(max)(ω) and H^(approx)(ω). The minimumselector 310 is then arranged to output the lower of H_(max)(ω) andH^(approx)(ω). The output of minimum selector 310 represents the valueof the desired frequency response H(ω), and the output of the minimumselector 310 is connected to the output 320 of the desired frequencyresponse determination apparatus 110 so that the value representing thedesired frequency response H(ω) can be fed to the output 320.

The desired response determination apparatus 110 of FIG. 3 may includeother components, not shown in FIG. 3, such as a maximum selectorarranged to compare a value of the frequency response to the minimumlevel of the desired frequency response, H_(min)(ω), and to select themaximum of such compared values. Such a maximum selector couldadvantageously be arranged to compare H_(min)(ω) to the output of theminimum selector 310, in which case the output of the maximum selectorcould advantageously be connected to the output 320 of the desiredresponse determination apparatus 110. Alternatively, such a maximumselector could be arranged to compare H_(min)(ω) to the output from theresponse approximation determination apparatus 300, in which case theoutput of the maximum selector could advantageously be connected to theinput of the minimum selector 310, instead of connecting the output ofthe response approximation determination apparatus 300 to the minimumselector 310 (cf. expressions (6a) and (6b) above). A desired responsedetermination apparatus 110 could furthermore include other componentssuch as buffers etc.

The desired frequency response determination apparatus 110 canadvantageously be implemented by suitable computer software and/orhardware, as part of a filter design apparatus 100. A filter designapparatus 100 according to the invention can advantageously beimplemented in user equipments for transmission of speech, such asmobile telephones, fixed line telephones, walkie-talkies etc. The filterdesign apparatus 100 may furthermore be implemented in other types ofuser equipments where acoustic signals are processed, such ascam-corders, dictaphones, etc. In FIG. 4 a, a user equipment 400comprising a filter design apparatus according to the invention isshown. A user equipment 400 could be arranged to perform noisesuppression in accordance with the invention upon recording of anacoustic signal, and/or upon re-play of an acoustic signal that has beenrecorded at a different time and/or by a different user equipment.

Moreover, a filter design apparatus 100 according to the invention canadvantageously be implemented in intermediary nodes in a communicationssystem where it is desired to perform noise suppression, such as in aMedia Resource Function Processor (MRFP) in an IP-Multimedia Subsystem(IMS system), in a Mobile Media Gateway etc. FIG. 4 b shows acommunications system 405 including a node 410 comprising a filterdesign apparatus 100 according to the invention.

Table 1, as well as FIGS. 5 a and 5 b, illustrate simulation resultsobtained by determining the desired frequency response H(t′,ω′) for aparticular time t′ and frequency ω′ according to expression (4a) above(FIG. 5 a), and by determining the desired frequency response H(t′,ω′)according to an embodiment of the invention (FIG. 5 b). In FIG. 5 b,H(t′,ω′) is determined by use of expression (6a), where H_(max)(t′,ω′)is obtained by use of expression (12), where β(ω′)=3 dB, andH^(approx)(t′,ω′) is obtained by expression (4). In FIG. 5 a, the methodused to obtain H(t′,ω′) imposes no upper limit on H(t′,ω), i.e. H_(max)²=0 dB, in a conventional manner. In both the simulations presented inFIG. 5 a and those presented in FIG. 5 b, the following values of therelevant parameters are used: δ(t′,ω′)=1, γ₁=γ₂=1, H_(min) ²=−15 dB, andthe SNR of y(t′) at the current time and frequency is 10 dB.

The following expression can be used as a measure of the distortion ofthe residual noise, D^(noise):

$\begin{matrix}{D^{noise} = \frac{H^{2}(\omega)}{H_{\min}^{2}}} & (21)\end{matrix}$

while the distortion of the speech, D^(speech), may be expressed as:

$\begin{matrix}{D^{speech} = {\frac{1}{H^{2}(\omega)}.}} & (22)\end{matrix}$

D^(noise) could also be used as a measure of the fluctuations of theresidual noise.

In FIGS. 5 a and 5 b, five different signal levels are indicated:

1: The power spectral density {circumflex over (Φ)}_(y)(t′,ω′) of thenoisy speech signal y(t′)2: The power spectral density {circumflex over (Φ)}_(n)(t′,ω′) of thenoise component n(t′)3: Desired noise level, {circumflex over (Φ)}_(n)(t′,ω′)−H_(min) ²4: Power spectral density of speech component estimate s(t′):{circumflex over (Φ)}_(y)(t′,ω′)−H² (t′,ω′)5: Power spectral density of the residual noise n_(residual)(t′):{circumflex over (Φ)}_(n)(t′,ω′)−H² (t′,ω′)

Furthermore, a number of different signal level differences areindicated in FIGS. 5 a and 5 b:

A: SNR(t) of the noisy speech signal y(t′) as well as of speechcomponent estimate ŝg(t′) (10 dB)

B: H_(min) ² (15 dB)

C: Speech distortion: −H² (t′,ω′)D: Residual noise distortion, H_(min) ²−H²(t′,ω′)E: H²(t′,ω′)

In table 1, values of D^(noise) and D^(speech), as well as values of theworst case signal-to-noise ratio, are given as obtained by theconventional method of determining H(ω) illustrated in FIG. 5 a, and theinventive method illustrated in FIG. 5 b.

TABLE 1 A comparison of the noise suppression obtained by a conventionalnoise suppression method and the noise suppression method according toan embodiment of the invention. H (t′, ω′) H (t′, ω′) determineddetermined according according to (4a) to (6) and (12) H²(t′, ω′) −0.41dB −8 dB   D^(noise) 14.59 dB 7 dB D^(speech)  0.41 dB 8 dB Worst caseSNR −4.59 dB 3 dB

From the simulation results illustrated by FIGS. 5 a and 5 b as well astable 1, it is clear that the residual noise distortion and the worstcase SNR obtained by the inventive method is better than those obtainedby a conventional noise suppression technique. This improvement isgenerally obtained at the cost of an increase in speech distortion. Inmany cases, however, an increase in speech distortion is acceptable, ifthe fluctuations in the residual noise are reduced. Furthermore, it isclear from the above that the effects of the trade-offs made accordingto the invention between the distortions in the residual noise and thespeech can easily be computed. Hence, a decision on whether or not toapply the inventive method for selecting the desired frequency responseof a filter arrangement can be made based on an analysis of whatconsequences the application of the inventive method would have on thespeech distortion contra the residual noise distortion. Such analysiscould be made from time to time, and a decision could be made on whetheror not to apply the inventive method of determining MO could be made,based on the analysis. If it is found that a switch-over from aconventional manner of determining H(ω) to a method according to theinvention would be appropriate, such a switch-over could advantageouslybe made gradually, in order to achieve a seamless transition that is notnoticeable to the listener.

By the invention, a flexible and computationally simple way ofdetermining the desired frequency response H(ω) of a digital filter isobtained. By applying the method, fluctuations of the residual noise maybe reduced in a controlled manner, and the necessary trade-off betweenthe amount of fluctuations in the residual noise and the speechdistortion becomes rather simple. The invention can successfully beapplied to any noise reduction method based on spectral subtraction.

In the above, the invention has been discussed in terms of the noisesuppression of noisy speech signals. However, the invention can alsoadvantageously be applied for noise suppression in other types ofacoustic recordings. The signal y(t) in which the noise is to besuppressed is in the above referred to as a noisy speech signal, butcould be any type of noisy acoustic recording.

One skilled in the art will appreciate that the present invention is notlimited to the embodiments disclosed in the accompanying drawings andthe foregoing detailed description, which are presented for purposes ofillustration only, but it can be implemented in a number of differentways, and it is defined by the following claims.

1. A method of designing a digital filter (h(z)) for noise suppressionof a signal to be filtered (y(t)) wherein the signal represents anacoustic recording, the method comprising: determining a desiredfrequency response (H(ω)) of the digital filter, such that the desiredfrequency response does not exceed a maximum level, wherein the maximumlevel is determined in response to the signal to be filtered; andgenerating a noise suppression filter based on the desired frequencyresponse.
 2. The method of claim 1, wherein the maximum level of thefrequency response is a function of frequency.
 3. The method claim 1wherein the determining of a desired frequency response comprises:determining a maximum level (H_(max)(ω)) of the frequency response;determining an approximation (H^(approx)(ω)) of the frequency response;comparing the approximation with the maximum level; and selecting saidmaximum level as the value of the desired frequency response for afrequency for which the value of the maximum level is lower than thevalue of the approximation of the frequency response.
 4. The method ofclaim 3, wherein the steps of determining an approximation, determininga maximum level, comparing and selecting are repeated for at least twodifferent frequency bins.
 5. The method of claim 1, wherein thedetermining of the desired frequency response is performed in a mannerso that the desired frequency response does not take a value lower thana minimum level of the desired frequency response.
 6. The method ofclaim 5, wherein the maximum level is determined in dependence of theminimum level.
 7. The method of claim 1, wherein the maximum level isdetermined based on a measure of a noise level of the signal to befiltered.
 8. The method of claim 7, wherein the maximum level at aparticular frequency is determined in dependence of an estimate of thesignal-to-noise ratio of the signal to be filtered at the particularfrequency.
 9. The method of claim 8, wherein the maximum level isgenerated as a value corresponding to the numerical value of:${H_{\max}(\omega)} = {\max \{ {\sqrt{\frac{H_{\min}^{2}}{\beta}\frac{{\hat{\Phi}}_{y} - {{\hat{\Phi}}_{n}(\omega)}}{{\hat{\Phi}}_{n}(\omega)}},H_{\min}} \}}$wherein H_(max)(ω) is the maximum level as a function of frequency,H_(min) is a minimum level of the frequency response and β is atolerance threshold representing the maximum acceptable signal-to-noiseratio.
 10. The method of claim 9, wherein the value of the tolerancethreshold depends on the frequency for which the maximum level isdetermined.
 11. The method of claim 7, wherein the maximum level isdetermined in dependence of an estimate of the overall value of thesignal-to-noise ratio.
 12. The method of claim 7, wherein the maximumlevel at a particular frequency is determined in dependence of anestimate of the noise power of the signal be filtered at the particularfrequency.
 13. The method of claim 7, wherein the maximum level isdetermined in dependence of an estimate of the noise power of thesignal.
 14. A digital filter design apparatus arranged to design adigital filter (h(z)) for noise suppression of a signal be filtered(y(t)) wherein the signal represents an acoustic recording, the digitalfilter design apparatus comprising: a desired frequency responsedetermination apparatus arranged to determine a desired frequencyresponse (H(ω)) in response to the signal to be filtered by: determininga maximum level (H_(max)(ω)) of the desired frequency response independence of the signal to be filtered; and determining the desiredfrequency response in a manner so that the desired frequency responsedocs not exceed the maximum level.
 15. The digital filter designapparatus of claim 14, wherein the desired frequency responsedetermination apparatus is arranged to determine the maximum level ofthe desired frequency response as a function of frequency.
 16. Thedigital filter design apparatus of claim 15, wherein the desiredfrequency response determination apparatus is arranged to: determine anapproximation (H^(approx)(ω)) of the desired frequency response; comparethe approximation of the frequency response with the determined maximumlevel; and select the lower of the maximum level and the approximationof the desired frequency response as the value of the desired frequencyresponse.
 17. The digital filter design apparatus of claim 16, whereinthe desired frequency response apparatus is arranged to compare andselect on a per frequency bin basis.
 18. The digital filter designapparatus of claim 14, wherein the desired frequency response apparatusis arranged to determine the desired frequency response is in a mannerso that the desired frequency response does not take a value lower thana minimum level.
 19. The digital filter design apparatus of claim 18,wherein the desired frequency response apparatus is arranged todetermine the maximum level in dependence of the minimum level.
 20. Thedigital filter design apparatus of claim 14, wherein the desiredfrequency response apparatus is arranged to determine the maximum levelbased on a measure of the noise level of the signal to be filtered. 21.A user equipment for processing of an acoustic signal, the userequipment including a digital filter design apparatus arranged to designa digital filter (h(z)) for noise suppression of a signal be filtered(y(t)) wherein the signal represents an acoustic recording, the digitalfilter design apparatus comprising: a desired frequency responsedetermination apparatus arranged to determine a desired frequencyresponse (H(ω)) in response to the signal to be filtered by: determininga maximum level (H_(max)(ω)) of the desired frequency response independence of the signal to be filtered; and determining the desiredfrequency response in a manner so that the desired frequency responsedocs not exceed the maximum level.
 22. A node for relaying of a signalrepresenting voice in a communications system, the node including adigital filter design apparatus arranged to design a digital filter(h(z)) for noise suppression of a signal be filtered (y(t)) wherein thesignal represents an acoustic recording, the digital filter designapparatus comprising: a desired frequency response determinationapparatus arranged to determine a desired frequency response (H(ω)) inresponse to the signal to be filtered by: determining a maximum level(H_(max)(ω)) of the desired frequency response in dependence of thesignal to be filtered; and determining the desired frequency response ina manner so that the desired frequency response docs not exceed themaximum level.
 23. A computer-readable medium including program code fordesigning a digital filter (h(z)) for noise suppression of a signal(y(t)) to be filtered wherein the signal represents an acousticrecording, the program code portions adapted to, when run on a computer,determine a desired frequency response (H(ω)) of the digital filter;generate a noise suppression filter based on the desired frequencyresponse; and determine the desired frequency response in a manner sothat the desired frequency response does not exceed a maximum level,wherein the maximum level is determined in response to the signal to befiltered.